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Abstract 



We develop a unified approach to the problem of clustering in the three different fields of 
applications indicated in the title of the paper, in the case when the parametric function of 
the models is regularly varying with positive exponent. The approach is based on Khintchine's 
probabilistic method that grew out of the Darwin-Fowler method in statistical physics. Our 
main result is the derivation of asymptotic formulae for the distribution of the largest and the 
smallest clusters (= components), as the total size of a structure (= number of particles) goes 
to infinity. We discover that is the threshold for the limiting distribution of the largest 
cluster. As a by-product of our study, we prove the independence of the numbers of groups 
of fixed sizes, as n — > oo. This is in accordance with the general principle of asymptotic 
independence of sites in mean-field models. The latter principle is commonly accepted in 
statistical physics, but not rigorously proved. 
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1 Introduction: The objective and the context. 



We develop a unified approach to the problem of clustering in the three different fields of 
applications indicated in the title of the paper. The approach is based on Khintchine's proba- 
bilistic method that grew out of the Darwin-Fowler method in statistical physics. To the best 
of our knowledge, the first application of Khintchine's method for coagulation - fragmentation 
processes was made in |2I|, where it was used for the derivation of asymptotic formulae for 
the partition function of the invariant measure of the process. The present paper extends the 
method to much more complicated asymptotic problems arising in the study of clustering. 
Our main result is the derivation of asymptotic formulae for the distribution of the largest and 
the smallest clusters (= components), as the total size of a structure (= number of particles) 
goes to infinity. 

The organization of the paper is as follows. Section 2 provides a formal mathematical setting 
that encompasses the clustering problems arising in the contexts of coagulation - fragmentation 
processes, random combinatorial structures and additive number systems. The mathematical 
problem is stated as follows. Let the functions g, S : (C — (D be related via g{z) = e^^^\ \z\ < 
R,R>0. Under a given asymptotic behavior of the Taylor coefficients of the function S 
one must explore the asymptotic behavior of certain quantities related to the Taylor coefficients 
{c„} of the function g. 

The problem is considered for the class of functions S, such that a„ ~ n^~^L{n), I > 0, n oo, 
where L is a slowly varying function. A specific feature of this class of functions is that it 
provides the validity of the normal local limit theorem for the associated probabilistic model. 
In Section 3 we explain the idea of Khintchine's method and apply it to the derivation of the 

asymptotic formulae for the limiting distributions of the largest and the smallest clusters. We 
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find that n'+i is the threshold for the limiting distribution of the largest cluster. 
In Sections 4-6 we demonstrate how to interpret these asymptotic formulae in the context 
of the aforementioned three fields, and we provide a description of the striking picture of 
clustering that follows from these formulae. It turns out that for large n, almost all weight 
of n is distributed into groups of sizes about riw", while the rest of the weight is made up of 
groups of small sizes. As a by-product of our study: 

(i) We prove the independence of the numbers of groups of fixed sizes, as —>■ oo. This is 
in accordance with the general principle of asymptotic independence of sites in mean-field 
models. The latter principle is commonly accepted in statistical physics, but not rigorously 
proved. 

(ii) We recover an asymptotic result by J. Knopfmacher, A. Knopfmacher and R. Warlimont, 
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that is widely known in the theory of additive number systems. 

2 Mathematical setting and preliminaries. 

We consider throughout the paper the set J-'{1), / > 0, of sequences a = {a„}f^, cin > 0, n > 
1, with the following asymptotic behavior: 

a„ ~ n''^^L(n), as n — oo, I > 0. (2-1) 

Here and in what follows, L is a slowly varying (s.v.) function at infinity (for references see 
|37j ) ■ We will need the following two asymptotic properties of s.v. functions: 

L{x) = o(x^), as X oo, for all e > 0, (2.2) 
= o{L{x)), as X ^ oo, for all e > 0. (2.3) 

We assume further that 

• L is differentiable on [0,cxd). This is based on the fact ([TT, p.l7) that for any s.v. 
function L there exists a s.v. function L, that possesses the aforementioned property 
and satisfies L{x) ~ I/(a;), as a; — cxd. 

• The function x'^^L{x) is locally bounded on [0, oo), for any 5 > 0. 

It is easy to derive from the representation of the set of s.v. functions ([SZ|, P-2) that the 
sequences a G I > 0, satisfy 

lim ^ = 1. (2.4) 

We will also need the fact that a s.v. function L has a conjugate function L* (|SZ], P- 25 and 
[TUj . p. 47), which is also a s.v. function and is uniquely defined (up to asymptotic equivalence) 
by the asymptotic relationship 

L*{x)L{xL*{x)) ~ L{x)L*{xL{x)) ~ 1, as x ^ oo. (2.5) 

()2.5|1 says that the asymptotic behavior of L* is converse to the one of L, in the sense that 

lim L*(ra) = ( lim L(n))"\ (2.6) 
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provided the limits exist. The characterization of the class of sequences J^{1), / > 0, is given 
by the celebrated Karamata Tauberian theorem (for references see jHZ], p.59 and p. 423) 
that is a widely used tool in different fields of probability. 
In effect, we will employ the following corollary of Karamata's theorem. 

Theorem 1 (JJBI, p. 4 23) 

Let a sequence a = {an > 0, n > 1} be ultimately monotone, and suppose that the radius of 
convergence of the power series (in z) 

oo 

5(z) = ^a„z", ze(C (2.7) 

n=l 

equals 1. Then the two conditions (i) and (ii) are equivalent: 

S{z) ^ . />0, asz^l-, (2.8) 



{l-zY \l-z. 
where V is the gamma function, and 
(u) 

an ~ n^'^L{n) E I > 0. (2.9) 



For our subsequent study we will make use only of the abelian part (i) of the above theorem. 
Next we define the sequence c = {cn}'^ generated by the above sequence a in the following 
manner: 

oo 

g{z) ■=J2cnz'' = e^^'\ \z\<l. (2.10) 

n=0 

We will demonstrate in Sections 4-6 that the above form of the exponential relationship 
between two generating functions arises in the three fields in the title of the present paper. 
In view of this, a variety of problems related to (j2.1Uj) (but quite different from the problem 
considered by us) have been studied by many researchers. 

Based on ()2.4|) . it is easy to derive (see ^21 and .Lemma 1.22) that the radius of convergence 
of the series for g{z) equals 1. Moreover, it was recently proven by J. Bell and S. Burris ([S], 
Lemma 4.2) that (j2.4p implies 

lim ^ = 1. (2.11) 

n^oo C„+i 
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This fact is important, since by Compton's density theorem (see for references jH] and 
Ch.4), the condition ()2.1H) imphes that all partition sets of an additive number system have 
asymptotic density which is either or 1. 

To formulate the problem of clustering that is addressed in the present paper we introduce 
some more notation. For given r,n, 1 < r = r{n) < n, n = 1, 2, . . . , we denote 

r n 

S^\z) = J2ci,z\ ^M(;^) = ^a,^^ |z| < 1 (2.12) 

i=l j=r 

and consider the two power series 

oo oo 

^{;)(^) = e^"'(^):=^cfz^- and ^M(^) = e^"^'(^) := ^ \z\ < 1. (2.13) 

j=0 j=0 

Setting r = n^, < /3 < 1 and denoting ch^ = c„, = 1, 2, . . . , our ultimate objective will 
be the derivation of the limits, as n — > oo, for the two quantities 

d^^:=^, and d^:^ :=—. (2.14) 

Here and in what follows we agree that r = • means that r = [•], where [•] is the integer part 
of the number •. 



3 Asymptotic formulae and limiting laws 

We will study the above posed problem with the help of the probabilistic method formulated 
by Khintchine in [21], Ch.IV,V (see also [201 )• Independently of the context of the prob- 
lem considered, the implementation of Khintchine's method for deriving asymptotic formulae 
always follows the following two - step scheme: 

(i) The construction of an auxiliary probabilistic model with a free parameter that enables 
one to express the quantity in question via the probability function of a sum of independent 
integer- valued random variables forming a triangular array. 

(ii) The proof of the normal local limit theorem via a proper choice of a free parameter in the 
probabilistic model in (i). 

The problem formulated in (j2.14j) requires the derivation of asymptotic formulae for the co- 
efficients Cn^ and Cn \ for all r = n^, < /3 < 1. In the case L{x) = 1 such a formula for 
ci"^ was established in pD|, with the help of Khintchine's method. Our primary aim in this 
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section will be to extend the method to the aforementioned problem ()2.14|) . This will require 
a much more complicated asymptotic analysis. 

The probabilistic model suggested below is a modification of the one in [20] • We start by 
setting in (jTT^ . fTWf 

z = e-'^+2™, (3.1) 
for some (r,a G R. Then, analogous to Lemma 1 in j^O], the following representations oi Cn\ 



and Cn ^ are valid: 



„1 r / oo ^—jka-^27viajk \ 



j=l \k=0 
1 n / oo kp-jkcr+2majk ' 



pi " / ^ „K p—J Ka+ZTTtaj K \ 



j=r \k=0 



l<r<n, n = l,2, (3.2) 

where a G -R is arbitrary. For this reason a is called a free parameter. It plays an important 
role in the method. 

To attribute a probabilistic meaning to the RHS's in ()3.2|) . we make use of the following 
notation: 



Pik = T\ 1 i = l'---'^' A; = 0,1,... (3.3) 

oo 

¥.,(«) =^p,fce2™^■^ aeR, 1 < j < n, (3.4) 

A:=0 

r n 

^^\a) = \[^,{a), ^W(a) = n¥'.(«), « e i?. (3.5) 

j=l j=r 

Notice that for a given j, ()3.3|) can be viewed as the Poisson probability function with param- 
eter aje~'^^ , a E R. Now ()3.2|) can be rewritten as 

c^;) = exp (^i'^e"") + na) ^ (^('^)(a)e-2™"rfa, 1 < r < n, n = l,2,... (3.7) 

The representations ()3.6p . ()3.7p belong to the core of Khintchine's method. The idea behind 
the representations is that ip^'^\a) in ()3.6p can be interpreted as a characteristic function of 



7 



the sum Y_^^ = Xi + . . . + of independent lattice random variables Xi, . . . , X^, 1 < r < n, 
defined by 

Prix, = jk) = pjk, j = l,...,r, k = 0,l,... (3.8) 

Hence, 



Jo 



(3.9) 



Analogously, writing Yn'^^ = X^ + . . . + X„, 1 < r < n, we get 



rpir) ^ .= / <^W(a)e-2™"da = Pr{Y^'^ = n). (3.10) 



In view of ()3.(j|l . ()3.7p and ()3.9|1 . ()3.1()|1 . we will focus now on finding the asymptotic behavior 
of the probabilities P{Y_n^ = n) and P(Fi^^ = n), as n ^ oo. 

First recall that the classical normal local limit theorems (see j^S], IHE]; P-78, [2^]) are restricted 
to the case of a sum of independent lattice random variables, while in our case, as we will 
see later on, the lattice random variables Xj given by (j3.8j) with a depending on n form a 
triangular array. So, even the existence of the limiting probability density for our problem is 
in question. 

Notwithstanding this, we will demonstrate (Theorem 1 below) that a proper choice of the free 
parameter a guarantees a version of the famous Gnedenko local limit theorem. 
Let = M(:)(a) := EY}:\ {B^^f = {B^:^f{a) := FarF^ and = ^^{a) := E{Y^^ - 
EY^'^f, and denote by M^"^ = M^'V), {B^n^f = (^n^)'(^) and pi''^ = p^:\a) the same 
moments of the sum yi''^ It follows from that t^Xj, j = 1, . . . ,n, are Poisson(aje '^^) 
random variables. So, we have the following expressions for the above quantities: 

r n 

Mi'■^ = $^Ja,e"'^^ MM = 5^Ja,e"'^^ n = l,2,... (3.11) 

i=l j=r 
r n 

{B^:^? = Y.^\e-'. (5i^^)' = EA-^"'^'' ^=1,2,.... (3.12) 

j=l j=r 



^^■^-EA-^"'''' p^:^ = E-^'V"''^ n = i,2,.... (3.13) 



i=i i=»' 



Now we choose in (j3.6j) fresp. (j3.7p ) the parameter a equal to the unique solution of the 
equations ()3.14|) fresp. ()3.15|) below: 

M^\a) = n (3.14) 
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and 



M^:\a)=n. (3.15) 

The existence and uniqueness of the solution (in a) of each of the two equations ()3.14j) . ()3.15|1 . 
for any given 1 <r <n and n = 1,2,..., follow from the assumption that a^- > 0, j = 1, 2, . . . . 
The idea of the above choice of the free parameter a, that goes back to Khintchine's book 
[221, is to evaluate the probabihties in (|3.9p . (|3.1(J|) when n is the "most probable value" of the 
sums Y^n' 1 '^n^- This makes the exponential factor in the expression of the normal density 
equal to 1, which will enable us to obtain the principal term in the asymptotic expansions for 
the above probabilities . We will assume further on that a G ^{l)^ I > and denote by q_n\ 
an^ the solutions of ()3.14j) . ()3.15|) correspondingly. 



Remark 1 In statistical physics, the idea of introducing a free parameter has its roots in 
the famous Darwin- Fowler asymptotic method developed in the 1920s for evaluating partition 
functions and mean values of occupation numbers. A good exposition of the method is given 
in f7?[ /, Ch.6. In this method, the above quantities are expressed as complex integrals over a 
circle around the origin, of an arbitrary radius (= free parameter). Evaluating the integrals by 
the method of steepest descents, the free parameter is taken to be equal to the unique minimum 
point in [0, 1] of the integrand. In the preface to his book '^291 Khintchine writes that the main 
novelty of his approach consists of replacing "the complicated analytical apparatus ( the method 

of Darwin-Fowler) by the well developed limit theorems of the theory of probability that 

can form the analytical basis for all the computational formulas of statistical physics. " 
Finally, notice that a probabilistic method for the study of asymptotic problems arising in 
enumeration of permutations was quite independently suggested in the 1940s by V. Goncharov. 
Subsequently the method was extensively developed by generations of researchers who applied 
it to general combinatorial structures. The history of this line of research can be found in 
Kolchin's book 131^ . 

Remark 2 As we already mentioned, a specific feature inherent in Khintchine's method is 
that the free parameter a depends on n, so that the random variables Xj, j = 1,2, . . . form a 
triangular array. In the case of an array the conditions for a normal local limit theorem are not 
known. For this reason, starting from A. Khintchine (see \29^ . Ch.IV) and until the present 
time, the establishment of a local limit theorem for sums of random variables depending on a 
free parameter required sophisticated asymptotic analysis that differed from problem to problem. 
As examples, see (in chronological order)fTW of G. Freiman, \3d^ .Gh.2 of A. Postnikov, \21f 
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of G. Freiman and J. Pitman, I'j'JI of R. Mutafchiev, l^lf of V. Kolchin, \2U^ of G. Freiman 
and B. Granovsky and of G. Freiman, A. Vershik and Yu. Yakubovitz. In particular, 
note that of J. Deshouillers, G. Freiman and W. Moran gives an example of an array of 
random variables for which the local limit theorem fails though the Lyapunov condition holds. 

Throughout the paper we will denote by /i, hi, i = 1, 2, . . . positive constants that appear in 
asymptotic formulae. 

The following basic property of the solutions d"„ allows for the implementation of ()2.8|) . 
The proof of it is similar to that of Lemma 3 in jJOj . 



Lemma 1 Let n > r > n'^, for some e > 0. Then 



lim = 0, lim a^l'^ = 0. (3.16) 

n— >oo n— >oo 

It is clear that the straightforward application of the summation formula (|2.8p to the sums in 
()3.11|) - ()3.13|) is not possible. Our subsequent asymptotic analysis extends the one in [201 in 
two different directions: from c„ to Cn\cn\ r = n^, < /5 < 1, and from the smooth case 
a„ ~ n'~^, Z > to the case a„ ~ n''^^L{n), I > 0. Our main tools will be the Euler integral 
test and a summation theorem of Abelian type. 

Consider the function /(x, a) = L{x)e~"^ , x > 0, a E R, / > 0. If a > 0, then for sufficiently 
large x > and sufficiently small a the function / has a maxima at the point x = x{a) which 
is the solution of the equation 

xL'{x) + ax)L{x) = 0. (3.17) 

Since (see P7].p.7) 

xL'(x) 

lim — Y = 0, (3.18) 

for any s.v. function L, the asymptotic solution of ()3.17|) is given by x ~ la^^, as cr ^ 0+. 
In the case a < 0, the function / is increasing in x for sufficiently large x > 0. Since we are 
interested in r = n^, < (3 < 1, Lemma 1 is valid. So, applying in both cases of a the integral 
test to the sums M„(r), M„(r), we can rewrite ()3.14p and ()3.15p as 



n 



M„{r)r^ / f{x,a^^)dx 



I ('■)i 

a^^l) 1^ t'Li^ ]exp(^-tstgn{aj;:^)yt, / > 0, n ^ oo (3.19) 
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and 



/ t'L(^-^jexp[-ts(7n(aMjdt, / > 0, n ^ oo (3.20) 



correspondingly. Next, in ()2.8p we set z = e"'""', so that 1 — z ~ |cr|, as a — 0, and apply the 
integral test to the sum S{z) in the LHS, to obtain 



^ t'L(^^)e~*rft ~r(/ + l)L(^^), />0, a^O. (3.21) 

Now we are in a position to establish asymptotic formulae for the three key parameters 
cr, and p of the problem considered. To facilitate the understanding of the forthcoming 
asymptotic formulae we make the following 



Remark 3 Combining l[3.iy^} and \3.20\} with \3.21]) . it is easy to see that, for all (3, < 
(3 < 1, both an\(Tn^ are 

< L{n)n'TTi, I > 0, (3.22) 

as n oo, where L is a s.v. function induced by the given s.v. function L. We will show in 
due course that is a threshold value in the context of the problem considered. 



It is plain that our objective requires the derivation of asymptotic formulae for the integrals in 
the RHS's of ()3.19|) . ()3.20p . The fact that a depends on n does not allow the straightforward 
application of (j2.8j) . To achieve the above goal we make use of the following fundamental fact 
in the theory of s.v. functions. 

Proposition 1 (11 Uf . Theorem 1.5.2, p. 22.) For any b > and any s.v. L, the convergence 

$(a;,A):=^^^]3^-A'^0, as x^oo (3.23) 
x°L[x) 

is uniform in X E [b, oo), if 6 < 0, and is uniform in X E (0, 6], if 6 > and if the function 
x~^L{x) is locally bounded on [0, oo). 

Based on this result we prove now the following Abelian summation theorem which is a version 
of Proposition 4.1.2, p. 199 in 110]. 
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Proposition 2 Let < 6 < oo and let bn b, bnZn oo, n — >■ oo. 
Then 



oo POO 



e-YL{tZn)dt ~ L{bnZn) / e'Ydt, I > 0, n ^ oo (3.24) 

bn J bn 

and, assuming the function x^^L{x) is locally bounded on [0, oo) for some 6 > 0, 

rbn pbn 

/ eH^L{tZn)dt ~ L{bnZn) I eH^dt, / > 0, n ^ oo. (3.25) 
Jo Jo 

Proof. In Proposition 1 we set x = Znbn, A = t{bn)^^ and write the identity 

L{Xx) = $(x, X)X-^L{x) + L{x), (3.26) 
where $(x, A) is as defined in ()3.23|) . Since A > 1 for all t > bn, Proposition 1 gives 

\ e-¥<l>(z„6„,-)(-j dt\<ebi e-H^~'dt, I > 0, (3.27) 
for all e > 0, (5 < and all sufficiently large n. In ()3.27p we have 



oo 



bi / e-H^-^dt ~ <^ ' 3.28 
Jbn \ e-^"6^+^ if 6 = oo, 

where h = b^ cH^-^dt < oo. Hence, the RHS of ()3.27|) tends to 0, as tt. — »• oo. Now we 
substitute (j3.26p into the LHS of (j3.24j) to get the first assertion. The assertion (j3.25|) is 
proved in the same manner, by applying Proposition 1 in the case 5 > 0. ■ 
From now on, we set r = nJ^, f = and assume that the limit d := lim„_,oo L{n), < d < oo 
exists. The forthcoming assertions tell us that the latter assumption plays a role only for the 
description of the behavior of the model at the critical point. In the case when the limit does 
not exist, the above description can be obtained in terms of partial limits of L{n), as n oo. 

Proposition 2 will be repeatedly used for derivation of asymptotic formulae for the key 
parameters. 

Lemma 2 (a) Let {I + 1)-^ < ^ < 1 and < [3 < {I + 1)"\ Then 

~ ^n^) ~ (r(/ + 1)) ^n-ThLi{n), / > 0, n ^ oo, (3.29) 

where Li is a s.v. function determined by the s.v. function L via the relationship 

1 



Li(n^+i) 
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LT+T{n)) , n ^ oo. (3.30) 



(b) Let the function x ^L{x), 6 > be locally bounded on [0, oo) and let < P < {I + 1) ^ 
and {I + 1)-^ </?<!. Then 

rr ' 



where 



while 



- fs ^>0, n^oo, (3.31) 

TV- 

— — — logn 7logn 



/>0, n^oo, (3.33) 



n- 



where 



^ ^ fi^Uf^ 1 ,logi<r) - log(7 logn) 

7 = 7n = (f + 1)P - 1 + — i , o„ = — — . (3.34) 

log n 7 log n 

(c) Let 13 = (3 = {I + 1)^^. Then the following three cases should be distinguished: 

(i) If < d < oo, then 

a^f) ~ An-T^Li{n), I > 0, n ^ oo (3.35) 

and 

al^^) ~ An-ThLi{n), / > 0, n oo, (3.36) 

where Li is a s.v. function given by 4,'/. ,VrJj) . w/iz/e A, A > are i/ie unique solutions of the 
equations 

A'+i= / t^e-'dt (3.37) 



^0 

and 

coo 



^ = / ^ fe-*(it (3.38) 

(a) lfd = 0, then ai''^ zs given by / TO^) . w/iz/e a^f^ (/zven 6?/ / TOT]) . / TO^) . 

(Hi) If d = oo, then aii^ is given by while gP is given by / f3^) ./ f3^) . 



Proof. Since the equations ()3.14|) . ()3.15p have unique solutions (in a), it suffices to check 
that the stated asymptotic formulae satisfy (j3.19p . (j3.2Uj) . ■ 

Corollary 1 Let al^\an^ be given as in Lemma 2. Then, as n ^ oo, 

,n(a^^A~\ z/ (/ + !)-! </3< 1 or p= + d^O 
r, z/ 0</3< (/ + !)-! or ^= (/ + l)-\ ci = 0, 

13 



,n(al^A\ tf 0< P <{l + l)-^ or p = {l + l)-\ dj^oo 
f, zf (/ + l)-i</?<l or p = {I + d = oo, 

p^/i^^A^ p^/i^ — n^oo. (3.41) 
— n n 

Remark 4 Lemma 2 and Corollary 1 show that (3 = is the critical value for the three 
key parameters a, and p. We will see later on that this fact has a crucial influence on the 
asymptotic behavior of Cn ana Cn ■ 



Corollary 1 implies that the following weaker (the third moment p = Y12=ii-^k — EXkY < 
Yl^=i \-^k — EXk\^, see j^H], P-278), form of Lyapunov's condition (see p. 278) holds for 
the sums Y_^^ and Fi'^'' of random variables defined by ()3.8|) : 

-p^O, -^^0, n^oo. (3.42) 

Recall that Lyapunov's condition is sufficient for the convergence to the normal law in the 
central limit theorem for independent random variables. Our next result shows that, for the 
triangular array considered, even a weaker form ()3.42|) of this condition is sufficient for the 
same convergence in the local limit theorem. 

Theorem 2 : Local limit theorem. 

Let a G I > 0, and let ali-\ On^ he as in Lemma 2. Then 

Pr{Y}^^ = n) ~ (27r5^)-^, n ^ oo, (3.43) 



Pr{YP = n)^ {2nB^)-^2, n ^ oo. (3.44) 

Proof. Our objective will be to derive the asymptotic behavior of the integrals T and T given 
by ()3.9|) and ()3.10|) respectively. The integrands in ()3.9|) and ()3.10|) are periodic with period 
1. So, for any ao, < ao < 1/2, the integrals can be written as 

T = T^ + T^, T = Ti + fa, (3.45) 

where T^^ = T^(a;o), fi = fi(ao) and T2 = T2{ao), T2 = f2(ao) are integrals of the integrands 
in ()3.9|) . (j3.10|) over the sets [— Q;o,ao] and [—1/2, — ao] U [ao, 1/2] respectively. Following the 
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approach of j^I], [2D1, we will first show that for an appropriate choice of = «o(n), the 
main contributions, as n ^ oo, to T and T come from T^, and Ti respectively. From ()3.3p - 
fl3.5|l we have for a G -R, 

^^■(") = E I? , = i^^^-'" (e'™^' - 1)) (3-46) 

K!exp(aje ■''^j \ ^ \ // 

and 

= exp aje-^^^^^ (e^™^' - l) j , (3.47) 

^^'\a) = exp aje-^"""^ (e'™^' - l))- (3.48) 
Substituting the Taylor expansion (in a) 



e2™^ -l = 27riaj-27rVj2 + 0(a^j^), as a ^ 0, (3.49) 
in (jlTTTfl . (nriH|) and employing (jSUlD, (nTTHll . gives 

^(^)(a)e-2™" = exp(-27r2a2^2^0(aV)) , as a ^ 0, (3.50) 
(^(^")(a)e-2™" = exp(-27r2a252 + 0(a3p)) , as a ^ 0. (3.51) 

We write now 

«fe=(«o^)';^, al-p={aoBf^^ (3.52) 
to conclude that, by ()3.42p . there exist = ao(n), ao = ao('^) such that 

lim aQB_ = lim oq-B = +oo (3.53) 

and 

lim alp = hm alp = 0. (3.54) 

n— >oo — n— >oo 

We see from fl3.54|l that Oq, ao — ^ 0, n ^ oo, because p, p ^ oo, n ^ oo, by ()3.41|1 . Also 
note the fact that ()3.49p holds for all a G [— «05^^o] Ul^'^o^ao]. As a result, we arrive at the 
asymptotic formulae for the integrals T_^,Ti : 

~ / exp [-2-n'^a^g^) da = 

1 /-^-OS 2 ^ 

exp(— — )c(z ~ — , n ^ oo, (3.55) 



2vr5 y_2^aoB 2 v^27r5 
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/ao 
exp {-27r^a^B'^) da 



exp{——)dz ~ , n ^ CO. (3.56) 



Now we turn to the estimation, a.s n —>■ oo, of the integrals Tg, T2. 
We have 

/■1/2 _ /-i/z 

|T2|=2|/ ^(^)(a)e-2™"rfa|, IT2I = 2| / (^("")(a)e-2'^*""c/a|. (3.57) 

It follows from (HTiTj) . (HTiHll that 

l^fc) (a) I = exp ^-2 £j aje'^^"^ sin^ yraj j , a G i?, (3.58) 

|^(^")(a)| = exp ( -2 ^ a^e^-^'^" ' sin^ naj ) , a G i?. (3.59) 



j=r 



We denote 



(a) = 2 ^ aj-e-^'"'^' sin^ vraj, < a < 1/2, (3.60) 



(a) = 2 ^ aj-e-^''"''' sin^ naj, ao < a < 1/2. (3.61) 



For the sake of estimating the sums Vj~^ , Ki^^ we make use of the following inequality proven 
in ini: 

p+k-l 

2 sin^naj > -min{l, {akf}, \a\ < 1/2, \/k>2,p>l. (3.62) 
j=p 

We set . . - 

^0 = — ^2 — ' "0 = — — , (3-63) 

and apply ()3.62|) with 

k = l{\a!;^^\y\ fc = /(|^ri)"' (3.64) 

and different p,p. (Note that under the choice ()3.63|1 of ag^'^O; the conditions ()3.53|) . ()3.54p 
indeed hold.) Treating separately the cases (a), (b) and (c) in Lemma 2, we are able to show 
that 

g-zi-'W = o{B-^), ao<a< 1/2, n ^ 00 and 
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o{B' 



CiQ<a< 1/2, 



n 



oo. 



(3.65) 



Corollary 2 :Asymptotic formulae for c^T^ 



Letr = n^, < P< 1, andf = n^, 0< P < 1. Then 

cfc) ~ {2nBY-2 exp (^^jf) {e-^"^ ) + w^)) , n ^ oo, (3.66) 

~ (27r52)-i exp (sf He"""'^) + naf ^) , n ^ oo, (3.67) 



where 



2 

^(^")(e-^"'')~/i|^, n-.oo. (3.68) 

Proof. By Theorem 1 and ()3.6|) . ()3.7|) we get the asymptotic expressions ()3.66p . ()3.67|) . while 
(j3.68p is obtained with the help of the integral test, Proposition 2, Lemma 2 and Corollary 1. 



Theorem 3 :The limiting behavior of d!~' , Sn\ 
Denote oh^ = Cn- 

(i) Letr = n^, < f3 < 1. Then 



[l, zf {l + l)-^<p<l or p = {I + d = oo. 

(a) Let f > 2. Then 

lim dt^ = { f ^ " (3.70) 

" °° I exp f — Yl,j=i '^i ) 5 ^/ ^ > 2 is a /ia;e(i number. 

Proof: (i) Denote 

Afc) = 5(f) (e-^) - S^:^ (e-^') + n {a^ - a^:^) . (3.71) 
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Our objective will be to demonstrate that 



, -oo, if 0<(3<(l + l)-^ OT f3 = (I + d < oo 
lim A^^) = <; - ' - ' (3.72) 

0, if (/ + l)-i</3<l or (1= {l + iy\ d = oo. 

The first part of ()3.72j) follows in a straightforward way from the preceding asymptotic analysis. 
In the case corresponding to the second part, 

n^oo, (3.73) 

and, consequently, 

Sjz) (e-^-^) ~ S}^^ (e--""^) , n^oo. (3.74) 



Therefore, here a more subtle analysis is required. From the identity 

M«(a(f)) - M^\a^'>) = 0, r = n^, l>P>j^, n = 2,3,... (3.75) 

and the fact that q_^^ > ct^t^ > 0, we derive that n{a^^ — ^n'^) ^0, n ^ oo. Similarly, it can 
be proven that in the case considered 

^(r) (^e-^""') - Sj:^ (e-^""') ^ 0, n ^ oo. (3.76) 

Combining this with the asymptotic formulae in Corollary 1 and Corollary 2, proves the second 
part of (jSZZa). 

(ii) We outline only the proof of the second part of ()3.70p . We have for a fixed f. 



-(f) -(1) 



, and lim nf^"^ - a^^A = 0. (3.77) 

n— »oo V / 



Next we write 

n f — 1 

j=i j=i 

Since an^ > (Jn^ > and 

e-^"'-'-'^^ - 1 = (4^) - )j(l - 5n), l<J<n, (3.79) 
where 5n = ^n(j) 0, n — > oo, uniformly in 1 < j < n, we get the desired claim. ■ 
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4 Application 1: Reversible coagulation- fragmentation 
processes. 

We follow the formulation of the model given in ^H]. A population of n particles is partitioned 
into groups of various sizes that undergo stochastic evolutions (in time) of coagulation and frag- 
mentation. There are only two possible interactions: coagulation of two groups into one, and 
fragmentation of one group into two groups. The process of coagulation-fragmentation(CFP) 
is a time-homogeneous interacting particle system yj^, t > 0, defined as follows. For a given 
n, denote by = (fci, . . . , kn) a partition of the whole population n into ki groups of size 
i, i = 1, 2, . . . , n, where the numbers of groups /cj > are subject to the condition: 



called the total mass conservation law. The finite set Vt^ = {rj} of all partitions of n is the 
state space of the process ipt,t > 0. The rates of the infinitesimal (in time) transitions (= 
flips) are assumed to depend only on the sizes of the interacting groups, and are given by two 
functions ip and (j) : 

1. For i and j such that i + j < n, the rate of coagulation, (i, j) — > (z + j), of two groups 
of sizes i and j into one group of size i + j, equals ipihj)- 

2. The rate of fragmentation, {i + j) — > of a group of size i + j into two groups of 
sizes i and j, equals (p{i,j). 

Hereafter, we refer to the coagulation and fragmentation rates ip and as intensities. The 
intensities are required to satisfy ip{i,j) = "ipiJyi) > and 4>{i,j) = 4>{j,i) > 0. We also make 
the natural assumption that the total intensities of merging '${i,j;'r]) and splitting ^{i,j;T]) 
at a configuration r/ G fin are given by: 



n 




(4.1) 



i=l 



-^{i.i^ri) = '^(i,j-,h,kj) =i){i,i) {kikjf, i^j, 2 < i + j < n, 
^(i,i;r/) = ki, ki) = ip{i,i){ki{ki - l)y , 2 < 2i < n, 

(^{i,j;ri) = ^{i,j;ki,kj) = (j){i,j) {h+jy, 2 < i + j < n, 



(4.2) 



where 7 > 0. Note that the case 7 = 1 corresponds to the mass action kinetics. 
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In this paper, we study only reversible CFP's with nonzero intensities. It is known ( jT3], PU] ) 
that the class of such processes is characterized by the following property of the ratio of their 
intensities: 

... .X = — -, ^<i + J<n, 
(p{hJ> aidj 

where a = {ai = a{i)}, i = 1, 2, ... is a positive function. It is also known 
(|13j. |3Uj ) that, under ()4.3|) . the unique invariant measure fin on fi„ is given by 

(4.4) 

Here C~^{a) = := is the partition function for the probabihty measure fin, n > 1: 

^=E(I^i^^. . = ...Uen„. (4,5) 

The measure fin is the steady state of the reversible CFP considered. So, (j4.4p tells us that 
for a fixed n, the steady state is determined by n values of the function a. In view of this, it 
is natural to call a the parameter function of the process. Note that in contrast to the above, 
the transient behavior of the CFP's considered depends on the intensities ip and 0, rather 
than on their ratios. 

Remark 5 The measure fin is invariant under the following transformation of the parameter 
function a. Define the family of operators H^, h > on a set of parameter functions a : 

iHha)ij)=h^a„ J = 1,2,..., h > 0. 

It follows from and ^-4] ) that (with the obvious abuse of notation) 

Hhfin = fJ'u, h> 0. (4.6) 

This says that all results of the present paper are extended to the class of parameter functions 
{Hha : h > 0, a G I > 0}. (|^.6| ) also explains the possibility of introducing a free 

parameter for the treatment of problems related to fin- 

Our study is devoted exclusively to the steady state of the above CFP's, in the case when in 
fl4.4|) . 7 = 1 and n — oo. Treating S given by ()2.7p as a generating function of the positive 
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(4.3) 



sequence {a„}f^, such that the radius of convergence of the series ()2.7|) equals 1, it is known 
(see e.g. [T^) that the sequence {c„}j^ in ()4.5|) is generated by the function g defined by 

oo 

^7(^) = e^(^) = ^c„z", \z\<l. (4.7) 

ra=0 

To formulate the problem of clustering in the setting of CFP, we define on the probability space 
(f2„, the random variables Ki = K[^\ri) = the number of groups of size i, i = 1, . . . ,n, in 
a random partition 77 G i7„, and let g„ = qniji)^ (resp. = qjj])) be the size of the largest 
(resp. smallest) group. We will be interested in the probabilities Pr(g„ < r) and Pr{q^ > r). 
Making use of the notation in ()2.13|) . we have 



(r) _ "l "2 • ■ • "n 



A;i!/c2! . ■ ■ /Cri 

r?6f2,i:ij„(»7)<r 



fci fc2 A;„ 

E "l "-2 • • • "n /A c 



^ ki\k2\...kn\ 



This gives 



Pr{qn <r) = —, Pr{q^ > r)) = — . (4.9) 

We assume now that a G J^{1), I > 0, and r = n^, < P < 1. Then Theorem 3 admits the 
following interpretation : 

fo, if 0</3 <(/ + !)-! or p = (I + d < 00 

lim Pr(g„ < n^) = <^ - v ; m v ; , ^^^^^^ 

(1, if {1 + 1)-^<P<1 or /? = (/ + l)-i, rf = 00, 

while 

fo, if r = n^, 0</3<l 

lim Pr(g^ > r) = <^ (4.11) 
""^"^ I exp ( — X]j=i '^i ) ' if r > 2 is a fixed number. 

Remark 6 d^.i^ identifies n^+i as i/ie threshold for the limiting distribution of the size of 
the largest cluster, in the sense that 

1 . 



/ + 1 



inf{/3 : Pr(g:„ < n") = 1} 



VKe discuss the phenomenon in more details in Remark 8, in the context of random combina- 
torial structures. 
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To reveal the picture of clustering at the equilibrium of the CFP's considered, we establish 
one more fact. 

Theorem 4 Let a e J^{1), I > 0. Then 

lim Pr{ni^~^ < Qn < W^^^) = 1, Ve > 0. (4.12) 
(a) For all p such that rf <p < n^, with {3 < and e > 0, 

1, i/ < / < 1, 
lim PriKp = 0) = <( e-\ if 1 = 1, (4.13) 
^0, %f l>l. 

(in) For any two s-tuples of integers pi, ■ ■ ■ ,Ps > 1 and ki, . . . , kg > 0, 

lim Pr{Kp, = ki,...,Kp^ = ks) = f[ ^e-"""^ . (4.14) 

j=l 3 

Proof: fl4.12|) follows immediately from ()4.10j) . Next, we have 



Pr(Kp = 0)= J2 f^niv)-=^, l<p<n. (4.15) 

r;ef2„:fcp=0 " 



?2 

n 

solution in a of the equation 



Denote by an,p, B'^ p, the key parameters of the asymptotics of Cn,p. Namely, an,p is the unique 



MITH^) - papC-'^P = n, / > 0, (4.16) 
and is defined correspondingly. Then in the case 1 < p < n^, 0</?< j^, we have 

^n,p ~ ~ (r(/ + l))~n-iTiLi(n), B„„p ~ B^\ n oo, < (4.17) 

By the reasoning used for the proof of the second part of ()3.7Up . we get from ()4.17|) . for n — oo, 

< n((T„,p - a(f)) ^ 0, and Sn,p (e"""'") - S^^ {^~~"^) + ape''"'"-' ^0, / > 0, (4.18) 
which implies fl4.13p . Next, the relationship 

PriKp = kp) = ^Pr{Kp = 0), 1 < p < n, (4.19) 

Kp\ 

implies ()4.14|) for s = 1. For general s the proof is similar. ■ 
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Remark 7 ^.^j ) tells us that the random variables depicting numbers of groups of fixed 
sizes become independent, as n oo. This fact is in accordance with the general principle 
of asymptotic independence of particles in mean-field models, that is commonly accepted (but 
not rigorously proved) in statistical physics (see \2(J^ and references therein). In the case 
a G J-'{1), I < the independence principle was broadly discussed in the context of random 
combinatorial structures (see Remark 8). 

Now we are in a position to provide a verbal description of the striking feature of clustering 
for large n, in the case a G J-'{1), I > 0. 

• With probability 1, there are no clusters (=groups) of sizes greater than 0{n^), [3 > 
j^. Moreover, with probability 1, the size of the largest group lies in the interval 

[n'TT"^,n'TT+^]^ e > 0. 

On the other hand: 

• The n particles are partitioned into groups of sizes not greater than 0{n'^) in such a 
way that 

(i) with a positive probability there are groups of any fixed size; 

(ii) the limiting probability of having a group of a size p, p E [n", ra^], e > 0, < P < j^, 
equals 0, 1 — e~'^ orl, if 0</<l,/ = lor/>l respectively. 

Summing up the aforementioned picture, we conclude that for large n the distribution of 
clusters induced by the measure /i„ has a threshold n~ . 
Historical remarks 

It is generally accepted that the mathematical chapter of the history of CFP's traces back 
to the 1917 paper [38 by M. Smoluchowski. In this seminal work the mathematical theory 
of the process of pure coagulation of molecules of colloids was proposed. A deep discussion 
of the physical context and implications of Smoluchowski model was presented in Ch.III of 
the classical work by S. Chandrasekhar (1943) reprinted in Observe that coagulation 

was treated by Smoluchowski as a deterministic process. In the framework of this approach, 
there was derived in [3H] an infinite system of differential equations describing the evolution 
in time of the concentration of molecules of sizes 1, 2, . . .. (Note that some authors mistakenly 
attribute the equations to another paper by Smoluchowski published in 1916). 
Subsequently, the equations, after being generalized to allow also fragmentations of particles, 
became famous as a general model for processes of grouping and splitting in numerous fields 
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of applications. Efforts of generations of researchers were devoted to the intriguing mathemat- 
ical problems of existence, uniqueness and asymptotic behavior (in time and in the number 
of particles) of the solutions. 

It was understood a long time ago that a stochastic context could be attributed to Smolu- 
chowski's equations (SE). Corresponding stochastic models were independently reintroduced, 
under different names, in different fields of applications (see for details the review f^). The 
paper [S] attributes to A. Marcus the first stochastic model for pure coagulation, called the 
Marcus-Lushnikov process (MLP). Extensive study of MLP was concentrated around two sub- 
jects: the gelation phenomena and the relation of MLP to SE. ( "Gelation" is the name for the 
phase transition exhibited by the formation of a giant cluster that causes the violation of the 
total mass conservation law (j4.1|) ) . The main approach to these problems is based on treating 
the MLP as the stochastic coalescent. A program for investigating the relationship between 
MLP (= stochastic coalescent) and SE was outlined by D. Aldous in 0. Recent progress 
in this direction was made by J. Norris in [S^, who proved that under certain conditions a 
sequence of stochastic coalescents converges weakly to the solution of the SE. The theory of 
coalescents as a tool to study limits of coagulation models as ^ cxd, was developed by J. 
Pitman et. al (see e.g. |I3]). Parallel to this line of research, Monte Carlo algorithms based 
on MLP were developed for the numerical treatment of SE (see ^3|) and references therein). 
P. Whittle jH] proposed a reversible Markov process as a model for Flory's theory of polymer- 
ization developed in the 1940s. As a result, a system of SE (in the presence of fragmentation) 
was rediscovered for both deterministic and stochastic contexts (see also [42j). M. Aizenmann 
and T. Bak IJ, also motivated by Flory's theory, proved that for the continuous (in space) 
version of SE with constant kernels of coagulation and fragmentation, the free energy of the 
system decays exponentially as time t ^ oo. This important fact established the validity of 
Boltzmann's H-theorem for the time evolution of the system described by SE. Note that a 
general fact of increasing entropy for SE with kernels obeying the deterministic reversibility 
condition was independently proven in ^T] . 

The explicit formulation of a CFP as a Markov process on the set of partitions appears 
in the monograph ^30|], Ch.8, by F. Kelly, which contains also the expression ()4.4|) for the 
equilibrium distribution of reversible CFP's in the case 7 = 1. (In (SU] the model is called 
a clustering process.) The above formulation was reintroduced by S. Gueron in |21] in the 
context of animal grouping. As far as we know, Gueron, |24 , was the first to notice that SE 
are obtained from the Kolmogorov forward equations for the expected numbers of groups, by 
neglecting correlations among the numbers Kp of groups of different sizes p = 1,2, . . .. R. 
Durrett, B. Granovsky and S. Gueron [13^ studied the asymptotic behavior (in n) of EKp and 
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cov{Kp^,Kp^) at the steady state ()4.5p with 7 = 1 and an arbitrary parameter function a. 
They showed that 

hm cov{Kp^,Kp^) = 0, (4.20) 

n— >oo 

for any fixed pi 7^ P25 which agrees (for large n) with the assumption of independence of group 
numbers of fixed sizes, in the stochastic context of SE. In it was also shown that for a 
wide class of the parameter functions a and a fixed p, 

EKp ~ fcp, as n — >• 00, (4-21) 

where kp is the equilibrium solution of the continuous version of SE. However, it was found 
that both ()4.2()j) and ()4.21|1 fail when the group size p = p{n) — » 00, as n 00. The latter 
leads to the crucial difference in the behavior of stochastic and deterministic solutions at 
equilibrium. It is plain that the difference between the two models is the consequence of the 
mass conservation law (j4.H) that contradicts the assumption of independence. 
In the paper [20] by G. Freiman and B. Granovsky, Khintchine's probabilistic method was 
brought to the scenario. With the help of this method, asymptotic formulae for the partition 
function for the invariant measure ()4.4p were derived in the case when ~ ra'"^, Z > 0, n — 

00. In j2ni one can also find a sketch of the history of Khintchine's method. 

1. Jeon j2Zj found sufficient conditions on intensities of coagulation and fragmentation in SE 
under which the gelation phenomena occurs. Note that these conditions are not satisfied for 
the reversible intensities generated by the class J^{1)-, / > 0, of parameter functions considered 
in the present paper. 

The paper [22] by P. Laurencot and D. Wrzosek introduced a version of SE with coagulation 
and collisional fragmentation. The latter means that the fragmentation ( = breakage) occurs 
only result of a collision of two clusters. 

Essentially, all stochastic and deterministic processes discussed so far are mean-field models, 
in the sense that the rates of coagulation and fragmentation depend on the sizes of interacting 
groups only. J. R. Norris 35J formulated a continuum version of SE in the case when the 
coagulation rates depend not only on the particle masses but also on some other characteristics 
of the clusters (e.g., the shape of the cluster, the types of basic particles that form the cluster, 
etc). 

5 Application 2: Random Combinatorial Structures (RCS). 

A combinatorial structure (CS) of a size n is defined as a union of components (= nondecom- 
posable elements) of sizes 1, 2, . . . , n, and by RCS we mean the uniform probability distribution 
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on the finite set of all Pn CS's of size n. The RCS induces the component size counting process 
K*-"-* = {k[^\ . . . , K^^), where Ki = k\^\ i = 1,2 . . . ,n are the numbers of components (in 
a randomly chosen CS) of sizes i = 1,2, ... ,n, subject to the total mass conservation law 
()4.H1 . It was long ago understood that for a wide class of RCS's the distribution laws C of 
the processes K*^") have the following common feature called the conditioning relation (for 
references see the monograph Ch.2, by R. Arratia, A. Barbour, and S.Tavare and 5J, by 
R.Arratia and S.Tavare): 

n 

£(K(")) = £(Zi, ...,Z^\J2^Z, = n), n = l,2,..., (5.22) 

1=1 

where Zi, Z2, . . . are independent integer valued random variables. The great importance of 
the conditioning relation ()5.22|) is based on the following two interrelated facts that hold for 
a variety of instances of RCS's. 

• The distribution of Zi, z = 1, 2, ... is of one of the following three types: 
(i) Poisson(^^^, X > 0), (ii) Negative binomial (m^, x*, x G (0, 1)) 

or (iii) Binomial(mj, x > 0), where in all the cases x is a free parameter and rrii is 
the number of components of size i. 

• Corresponding to the type of the distribution of Z^, the relationship between the two 
key sequences {Pn} and {rrii} has the form: 

(i) 

00 „ 00 ,• 



n=0 i=l 



EP"^" = n(l-^T™% (5.24) 

n=0 i=l 

(iii) 

oo oo 

J2pnz'' = l[(l + zT'- (5.25) 

n=0 i=l 

In accordance with the above, the following three basic classes of CS's are distinguished (|1]): 
(i) assemblies, (ii) multisets and (iii) selections. 



First, we immediately see from ()5.2Hj) that assemblies are incorporated into our setting ()2.1()j) 
with ttn, Cn having a clear combinatorial context: a„ = ^, Cn = n = 1,2, . . . . 
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A quite different approach leading to the relationship ()5.23|) is widely known in combinatorics 
(for references see Ch.5). In this field, ()5.23p which is called the exponential formula, 
expresses the general enumerative principle for posets, that may be regarded as disjoint unions 
of their connected components. In particular, S{z) = Yl'^=i ^ni" 9^^^ ~ YlnLo 
called exponential generating functions for the number of connected components and for the 
total number of posets, respectively. Note that in the graph theory, ()5.23p is deduced from the 
generalized scheme of allocation (see [31j, Ch.l, by V. Kolchin), the latter being equivalent, 
in effect, to the aforementioned enumerative principle. 

Multisets can be also put into the framework of ()2.10j) . by exponentiation of the generating 
function for the sequence {nii} (for references see Ch.2 of the monograph JT], by S.Burris). 
We write 

oo oo oo 

J](l-^^)-«=exp(^m,log(l-^r') =exp(5^2" ^) (5.26) 

1=1 1=1 n=l j,k:jk=n 

to get from (|5.24|), c„ = p„, a„ = Y.j,k:jk=n it- 

Thus, the counting processes K^") for assemblies and multisets satisfy 

£(K(")) = (5.27) 

where /i„ is the measure given by (j4.4|) with 7 = 1 and the parametric function a is as indicated 
above. Though, in the case of multisets, a„ lacks a combinatorial meaning, it turns out that, 
under a certain condition, the asymptotic behaviors of the two sequences {ctn} and {mj} are 
similar. 

Proposition 3 (JW, Lemma 5.1) 

If the sequence {rrij} in \5.2b]) is such that 

lim — ^ = h, 0<h<l, (5.28) 

then Qj ~ TTLj, n — > 00. 

By virtue of Remark 5 this means that our results on clustering are applicable for multisets 
with TTij ~ j''~^L{j), j 00, l,h > 0. Now notice that applying the exponentiation 
procedure in the case of selections we arrive at an alternating sequence {a„} . This says that 
this case is beyond the scope of the setting of the present paper. 
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The asymptotic behavior of counting processes was fully explored for the subclass of RCS's 
characterized by the following logarithmic condition: 



lim iP{Zi = 1) = lim iEZi = 9, 



■oo i— >oo 



(5.29) 



for some ^ > 0, where the random variables Zi, i = 1,2, . . . are as in ()5.22|) . Such RCS's are 
called logarithmic. A comprehensive exposition of the research for this case is given in 
The classical example of a logarithmic RCS is the seminal Ewens sampling formula (ESF) given 
by a„ = ^, n > 1, 9 > 0. It originated in population genetics (1972) and was extensively 
investigated by many authors in relation to a variety of models. In particular, it was proved 
that the normalized ESF converges weakly to the Poisson - Dirichlet law (see jTH], [23 and 
references therein). The counting process induced by ESF can be interpreted as a theta— 
biased random permutation ([4 , Ch.3). The theory of the limiting behavior of the counting 
process in the case 9 = 1 {= random permutations) was shaped by V. L. Goncharov (1942), 
L. A. Shepp and S. P. Lloyd (1966) and A. M. Vershik and A. A.Shmidt (1977) (for references 
see Ch.l and [HI, Ch.4). 

On the other hand, integer partitions provide an example of a nonlogarithmic RCS. Partitions 
can be formally defined as a multiset with = 1, i > 1. Thus, fj5.26|) gives for this case 



which indicates that the case of partitions can be approximated by the class of parametric 
functions J-'i with 1 = 1. 

In the next section we explain that q - colored linear forests (treated as posets) is a RCS with 
a & J-'i. In the conclusion, we make the following 

Remark 8 The logarithmic condition \5.2y\j) fails for the class Ti, I ^ 0, of parametric func- 
tions a. On the other hand, the Lyapunov condition, and consequently the normal local limit 
theorem, hold only when I > 0. This explains why in the study of the clustering problem, the 
cases I = 0, I > and I < should be distinguished, with basically different asymptotic tools 
being employed. The third case that includes such RCS's , as forests of labelled (unlabelled) 
trees, was recently explored in '^61, by A. Barbour and B. Granovsky. Correspondingly, three 
very different pictures of clustering were discovered. A specific feature of clustering in the case 




where Dn is the set of all divisors of n. Consequently, 



1 < a„ < logn, 77, — »• oo. 
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/ > considered in the present paper is the existence of a threshold value for the size of the 
maximal component (cluster). So, this appears to be the only case (among a & Ti) in which 
the gelation phenomena is not seen. 

In the context of RCS's, the aforementioned principle of asymptotic independence of small 
groups (= components) has been widely discussed for a long time, in connection with the 
conditioning relation \5.2i3) . The independence was proved for the logarithmic RCS's (^, 
Ch. 4) md in the case EZj = j''~^L{j), I < in 



6 Application 3: Additive number systems (ANS). 

ANS's provide a very general setting that encompasses multisets, as defined in the previous 
section. Following 11 by S. Burris, an ANS ^ is a countable free commutative monoid 
A = {v} with a given set P of nondecomposable elements (= generators) and with an additive 
norm || • ||, such that the set 

{v E A :|| V 11= n} 

is finite for all n G A/". This definition implies that each f G A is a sum (= union) of elements 
of P. Denoting c^, p„ the number of elements in A and P correspondingly with norm n, an 
enumerative argument yields the following characteristic identity for ANS's: 

J2 Cnx'^ = ~ a;")-P", < x < p < 1. (6.30) 

n>0 n>l 

By the exponentiation of the RHS of ()6.30|) . we get the alternative version of the above 
identity: 

/ X ^ P(x"^) \ 

g{x) = expi22 — J' 0<x<p<l, (6.31) 

m>l 

where g and P are the generating functions for the sequences {c„} and {pn} respectively: 

^(x) = ^c„x^ P(x) = ^p„x", 0<x<p<l. (6.32) 

n>0 n>0 

Now ()6.3H) can be rewritten as ()2.10|) with 

an=y^-. (6.33) 

jm=n 

As we already mentioned before, the sequence {a„} defined by ()6.33p usually does not exhibit 
a regular asymptotic behaviour, i.e. does not belong to the class J-'{1), I > 0. 
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On the other hand, ANS's with p„ satisfying the condition of Proposition 3 are a nice exception 
to the above phenomenon. An example of such a structure is the set of g-colored hnear forests, 
treated as posets, in which case p„ = (see jTI], p. 24). It is also important to note that the 
radius of convergence of the generating series in ()().32j) cannot be greater than 1. 
We wish now to demonstrate that the known asymptotic result on ANS's that facilitated the 
development of Compton's density theory is a particular case of our asymptotic formula for 
Cn- In 1992 it was proven by J. Knopfmacher, A. Knopfmacher and R. Warlimont (see for 
references ^1], Theorem 5.17, p. 94) that if in (j6.3(jp . 

Pn = hq^ + 0{q'l), h>0, g > 1, < gi < g, 

then 

Cn ~ hiq^ — 3—, hi > 0, — > cxD, (6.34) 

where /ii > is a constant which was not specified. This result was obtained with the help of 
complex analysis. 

By virtue of Proposition 3, we see that a„ ~ j9„, n —* 00, which together with Remark 
5, permits to apply our asymptotic formula ()3.66p with I = 1 and L{n) = h. So, in the 
case considered, L*{n) = h^^ and we have in ()3.30|) . Li{n) = h~^/'^. Consequently, by the 
asymptotic formulae in Section 3, we have ai"'' ~ n^^/'^h^/'^, S^n\^~-" ') ~ h^l'^v^l'^ — h/2 + 
0(ai"^),and {E^^f 2h-^/^n'/\ as n — s> 00. Substituting this in ()3.66j) . recovers (j6.34|l . while 
specifying hi = '(l^Y^h'^l^e^^l'^ . 

The central problem in the theory of ANS's is the study of the asymptotic density 5(-B) of a 
subset 5 of a monoid A: 

6{B) = lim — , (6.35) 
where bn is the number of elements of B with norm n. 

It follows from ()6.33p that the quantities , dn '' in the clustering problem considered in the 
present paper can be regarded as the densities of the subsets, say Bi, B2 C A, such that the 
maximal (minimal) norm of generators of elements of Bi {B2) satisfies a certain condition. 
The fundamental result in this field is Compton's density theorem (1989) (see jllj, Ch.4, 5) 
that establishes sufficient conditions for existence of an asymptotic density of all partition sets 
of an ANS A. 

Coming back to the example of g-colored linear forests (as posets), our Theorem 2, applied 
with / = 1 and L{n) = h, gives the asymptotic density of the aforementioned sets Bi, B2. 
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